Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema inference parameterized types #32757

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

reuvenlax
Copy link
Contributor

@reuvenlax reuvenlax commented Oct 11, 2024

Beam automatically infers schema for common Java types: POJOs, JavaBean, AutoValue, AVRO, among others. In addition for it's use with schema transforms, this also provides a simple, efficient way of using these types in PCollections without needing to construct a Coder. A frequent complaint has been that this inference did not work in the presence of generic type parameters. E.g. this means that while Beam can handle PCollections of this class:

@DefaultSchema(JavaFieldSchema.class)
class MyType {
String field1;
}
PCollection myTypes = read();

Schema inference would fail for this:

@DefaultSchema(JavaFieldSchema.class)
class MyType<T> {
T field1;
}
PCollection<MyType> myTypes = read();

This PR adds support for generic type parameters to schema inference for common types, addressing the above issue.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@reuvenlax reuvenlax force-pushed the schema_inference_parameterized_types branch from 1957eea to 1366db3 Compare October 13, 2024 16:17
Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality looks sound and tests look great

Left some nits

There's few changes to public, non-internal, method signatures that are breaking though. Might make sense to overload these methods just in case (would also narrow down the scope of this PR)


if (type instanceof TypeVariable) {
TypeVariable<?> typeVariable = (TypeVariable<?>) type;
return Preconditions.checkArgumentNotNull(boundTypes.get(typeVariable));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to fall back on type if boundTypes doesn't contain it for whatever reason? Instead of failing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to, as that would simply be hiding a bug which might cause other problems (often caused by a call to getType() instead of getGenericType()).

Type[] typeArguments = parameterizedType.getActualTypeArguments();
;
if (typeArguments.length != typeVariables.length) {
throw new RuntimeException("Unmatching arguments lengths");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: include the typeDescriptor in the error message for easier debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -243,4 +255,45 @@ public static TypeDescriptor boxIfPrimitive(TypeDescriptor typeDescriptor) {
? TypeDescriptor.of(Primitives.wrap(typeDescriptor.getRawType()))
: typeDescriptor;
}

public static <T> Map<Type, Type> getAllBoundTypes(TypeDescriptor<T> typeDescriptor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment/javadoc for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +129 to +130
public static FieldValueTypeInformation forField(
Field field, int index, Map<Type, Type> boundTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a breaking change to a public method (not marked with @Internal). Are we okay with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with some other changes to method signatures in this class: forGetter() and forSetter()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think so. This is a logically an internal class, which was only marked public because we needed to access it from other packages (e.g. the protobuf package). The fact that it was not marked @internal was likely a mistake, and I think it's fine to change it in this case.

@@ -161,7 +162,8 @@ private static boolean matchConstructor(
// Verify that constructor parameters match (name and type) the inferred schema.
for (Parameter parameter : constructor.getParameters()) {
FieldValueTypeInformation type = typeMap.get(parameter.getName());
if (type == null || type.getRawType() != parameter.getType()) {
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cleanup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 75 to +78
public static Schema schemaFromPojoClass(
TypeDescriptor<?> typeDescriptor, FieldValueTypeSupplier fieldValueTypeSupplier) {
return StaticSchemaInference.schemaFromClass(typeDescriptor, fieldValueTypeSupplier);
TypeDescriptor<?> typeDescriptor,
FieldValueTypeSupplier fieldValueTypeSupplier,
Map<Type, Type> boundTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: breaking change to a public method's signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Internal annotation. This was never meant to be used outside of Beam internals.

.filter(m -> !Modifier.isProtected(m.getModifiers()))
.filter(m -> !Modifier.isStatic(m.getModifiers()))
.forEach(methods::add);
c = c.getSuperclass();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this functionality change to get all methods in the superclass chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an existing bug that was uncovered by the new unit tests I added in this PR. JavaFieldSchema (for POJOs) finds all fields in superclasses and adds them to the schema, but we didn't do that for JavaBeans. I changed this to make things consistent and fix the broken tests.

BTW related to your other comments - we should probably mark all of these. helper classes as Internal. They were never meant for external users to use them, and this change (among others) is at least a functionality change

Comment on lines +89 to +91
TypeDescriptor<?> typeDescriptor,
FieldValueTypeSupplier fieldValueTypeSupplier,
Map<Type, Type> boundTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another breaking method signature change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with fieldFromType() method below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Internal annotation. the fact that it wasn't so marked before was an oversight.

@@ -625,4 +626,41 @@ public void testSetterConstructionWithRenamedFields() throws NoSuchSchemaExcepti
assertEquals(
registry.getFromRowFunction(BeanWithCaseFormat.class).apply(row), beanWithCaseFormat);
}

@Test
public void testRegisterBeamWithTypeParameter() throws NoSuchSchemaException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
public void testRegisterBeamWithTypeParameter() throws NoSuchSchemaException {
public void testRegisterBeanWithTypeParameter() throws NoSuchSchemaException {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.addInt16Field("value5")
.build();
assertTrue(expectedSchema.equivalent(schema));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests for collection, map, nested types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants